Any tips to help a scientist become a better programmer?

@mypasswordistaco@iusearchlinux.fyi · 1 year ago

Any tips to help a scientist become a better programmer?

@owenfromcanada@lemmy.world · 1 year ago

While there are lots of programming courses out there, not many of them will explicitly teach you about good programming principles. Here are a couple things off the top of my head:

High cohesion, low coupling. That is, when you divide up code into functions and classes, try to minimize the number of things going between those functions (if your functions regularly have 6+ arguments, that’s a red flag and should be reviewed). And when something needs to be broken up into pieces, try to find the spots where there are minimal points of contact.
Try to divide code between functions and files in a way that doesn’t feel too busy. If there are a bunch of related functions that are cluttering up one file, or that are referenced from multiple places, consider making a module for those. If you’re not sure what “too busy” means…
Read a style guide. There are lots of things that will help you clean up and organize your code. The guide won’t necessarily tell you why to do each thing, but it’s a great tool when you don’t have another point of reference.

If you have a chance to take a “Software Engineering 101” class, this is where you’d learn most of the basic principles for writing better code.

Savaran · 1 year ago

Approach programming with the same seriousness that you’d expect a programmer to approach your field with. You say yourself you just want it to “do the thing, conventions be damned”.

Well how would you feel if someone entered your lab or whatever and treated the tools of your trade that way?

@owsei@programming.dev · 1 year ago

I would agree if OP was trying to get a job as a developer, however I don’t think they are.

It’s more like you used a beaker for something and shook it to mix water and salt, it’s not the recommended way, but it’s fine.

@groucho@lemmy.sdf.org · 1 year ago

At some point, they’re gonna have to debug it.

@ericjmorey@programming.dev · 1 year ago

This isn’t a good assumption for researchers.

@Andy@programming.dev · 1 year ago

Two books that may be helpful:

Fluent Python by Luciano Ramalho
Python Distilled by David M. Beazley

I’m more familiar with the former, and think it’s very good, but it may not give you the basic introduction to object oriented programming (classes and all that) you’re looking for; the latter should.

Vahtos · 1 year ago

This is only tangentially related to improving your code directly as you have asked. However, in a similar vein as using source control (git), when using Python learn to manage your environments. Venv, poetry, conda/mamba, etc are tools to look into.

I used to work with mostly scientists, and a good number of them knew some Python, but none of them knew how to properly manage their environments and it was a huge problem. They would often come to me and say “I ran this script a week ago and it worked, I tried it today without making any changes and it’s throwing this error now that I don’t understand.” Every time it was because they accidentally changed their dependencies, using their global python install. It also made it a nightmare to try to revive old code for them, since there was almost no way to know what version of various libraries were used.

@ericjmorey@programming.dev · edit-2 1 year ago

This is huge. Unfortunately, as you indicated, there’s no standard tool for this and new ones are being added to the mix. Many in the science feilds are pushed towards Conda but I’m not sure it’s the best option. However, Conda will be infinitely better than not using anything to manage environments and dependencies.

@agent_flounder@lemmy.world · 1 year ago

As the other commenter said, you want to learn about programming principles. Like, low coupling or don’t repeat yourself.

How long is your longest program? What would you say is a typical length?

You say your code is “bad” – in what ways? For example:

Readability (e.g. going back to it months later so you go “oh I remember” or “wtf does this do?!”
Maintainability (go back to update and you have to totally rework a bunch of stuff for a change that seems like it should be simple)
Reliability (mistakes, haphazard “testing”, can’t trust output)
Maybe something else?

@wargreymon2023@sopuli.xyz · edit-2 1 year ago

Think two things:

optimize the control flow of your code
make it easy to read

You should also be disciplined with these two ideas, your code will look better as you become more experienced, 100% guaranteed.

@abhibeckert@lemmy.world · edit-2 1 year ago

Can anyone recommend good resources for learning programming

Honestly? No. The best resource is you. Ask questions. Get experience. Ask questions. Get experience. Repeat.

It’s not enough to learn. You also have to do. And you really should learn by doing in this field.

First of all - fuck Python. I’m sure it’s possible to write good code in that language, but it’s not easy and it requires a lot of discipline. I don’t mean to be mean to Python, it’s a truly wonderful language, arguably one of the best languages, when used properly. But it sounds like you’re not using it properly.

Pick a language that:

Has static typing
Does not do garbage collection

Static typing forces you to have more structure in your code. You can have that structure in a dynamic language but nobody ever does in practice and part of the reason is all of the libraries and third party code you interact assume you have dynamic typing as a crutch to quickly and easily solve hard to solve problems.

It’s far better to actually solve those problems, rather than avoid them. You’ll tend to create code where bugs are caught when you write the code instead of when someone else executes the code. That “you vs someone else” distinction is a MASSIVE time saver in practice. It took me about 20 years, but I have learned dynamic typing sucks. It’s convenient, but it sucks.

For more info: https://hackernoon.com/i-finally-understand-static-vs-dynamic-typing-and-you-will-too-ad0c2bd0acc7

On garbage collection - it’s a similar issue. It’s really convenient to write code where “someone else” deals with all that memory management “garbage” for you but the reality is you should be thinking about memory when you write your code because, at it’s heart, 99.999% of the code you write is in fact just moving memory around. Garbage collection is like “driving” a Tesla with autopilot active. You’re not really driving at all. And you can do a better job if you grab that wheel and do it yourself.

I recommend starting with a manually memory managed language (like RUST) to learn how it works, and then from there you might try a language that does most of the work for you without completely taking it out of your hands (for example Swift, which has “automatic” memory management for common situations but it’s not a garbage collector and in some edge cases you need to step in and take over… a bit like cruise control in a car if we’re going to use that analogy.

It’s getting harder these days to find a language that doesn’t have garbage collection. The industry has gone decades thinking GC is a good idea and we just need one more fix, which we’re working on, to fix that edge case where it fucks up… and then we find another edge case, and another, and another… it’s a bit of a mess and entire papers have been written on the subject. But anyway some of the best and newest languages (Rust, Swift, etc) don’t have Garbage Collection, which is nice (because writing code in C or Fortran sucks — I’m not recommending that).

That’s enough for now. Just keep muddling about learning those languages first before trying to tackle bigger problems. Programming is a difficult task, just like a baby learns to sit up, then roll over, then crawl, then stand, then walk with assistance, then stumble around, then walk, then run, then ride a bicycle with three wheels, then a two wheel one with no pedals, then a bicycle with pedals, then a car after that…

You skipped all those steps and went straight to driving a car (with autopilot). To learn properly, you don’t need to go all the way back to “sitting up and crawling”, but you should maybe go back just a little bit. Figure out how to get code to run, at all, in a language like rust, get familiar with it.

After you’ve done that come back here and ask what’s next. We can talk about SOLID, Test Driven Development, all the intricacies of project management in git, exceptions vs returning an error vs failing silently, and when to use third party code vs writing your own (phew boy that’s a big one…).

But for now - just learn a lower level language. Programming is a bit like physics. You’ve got elements, and under that atoms, and under that… well I don’t even know what’s under that (you’re the scientist not me). There are “layers” to programming and it’s important to work at the right layer and it’s also important to understand the layer below and above the one you’re working at.

If Python is at layer x, then you really need to learn layer x-1 in order to be good at Python. You don’t need to go all the way down - you can’t go all the way down (how do magnets work?).

Ephera · 1 year ago

Could be good to try to ‘reset’ your brain, by learning an entirely new programming language. Ideally, a statically typed, strict language like Rust or Java, or Scala, if you happen to have a use for it in data processing. They’ll partially force you to do it the proper way, which can be eye-opening and will translate backwards to Python et al.
Just in general, getting presented the condensate of a different approach to programming, by learning a new language, can teach a lot about programming, even if you’re never going back to that language.

For learning more about Git, I can recommend Oh My Git!. It takes a few hours to get through. In my experience, it’s really useful to have at least seen all the tools Git provides, because if something goes sideways, you can remedy it with that.

@testeronious@lemmy.world · edit-2 1 year ago

deleted by creator

@ericjmorey@programming.dev · 1 year ago

Do you want to work as a developer? Or do you want to want to continue with your research and analysis? If you’re only writing code for your own purposes, I don’t know why it matters if it’s conventional.

@agent_flounder@lemmy.world · 1 year ago

I guess if you are unlikely to go back and change it, or understand how it works, then sure. And yeah that happens.

I write scripts and utilities like that. Modularity is overkill although I do toss in a comment or two to give a hint to future me, just in case.

Although tbf, I took plenty of CS classes and some of the instructors beat best practices into our heads… So writing sloppy, arcane, spaghetti code causes me to flinch…

Elise · 1 year ago

I’ve got two tips to add to the pile you’ve already read.

I recommend you read the manuals related to what you are using. Have you read the python manual? And the ones for the libraries you use? If you do you’ll definitely find something very useful that you didn’t know about.

That and, reread your code. Over and over until it makes total sense, and only run it then. It might seem slow, and it’ll require patience at first. Running and testing it will always be slower and is generally only useful when testing out the concept you had in mind. But as long as you’re doing your conceptual work right, this shouldn’t happen often. And so, most work will be spent trying to track down bugs in the implementation of the concept in the code. Trust me when you read your code rigorously you’ll immediately find issues. In some cases use temporary prints. Oh and avoid the debugger.

@Aceticon@lemmy.world · edit-2 1 year ago

Most of the “conventions” (which are normally just “good practices”) are there to make the software easier to maintain, to make teamwork more efficient, to manage complexity in large code-bases, to reduce the chance of mistakes and to give a little boost in productivity.

For example, using descriptive names for variables (i.e. “sampleDataPoints” rather than “x”) reduces the chances of mistakes due to confusing variables (especially in long stretches of code) and allows others (and yourself if you don’t look at that code for many months) to pick up much faster what’s going on there in order to change it. Dividing your code into functions, on the other hand, promotes reusability of the same code in many places without the downsides of copy & paste of the same code all over the place, such as growing the code base (which makes it costlier to maintain) and, worse, unwittingly copying and pasting bugs so now you have to fix the same stuff in several places (and might even forget one or two) rather than just fixing it in that one function.

Stuff at a higher, software design level, such as classes, are mean to help structure the code into self-contained blocks with clear well controlled ways of interaction between them, thus reducing overall complexity (everything potentially connecting to everything else is the most complex web of connection you could have) increasing productivity (less stuff to consider at any one point whilst doing some code, as it can’t access everything), reduce bugs (less possibility of mistakes when certain things can only be changed by only a certain part of the code) and make it easier for others to use your stuff (they don’t need to know how your classes works, only to to talk to them, like a mini library). That said, it’s perfectly feasible to achieve a similar result as classes without using classes and using scope only, though more advance features of classes such as inheritance won’t be possible to easilly emulate like that.

That said, if your programs are small, pretty much one use (i.e. you don’t have to keep on using them for years) and you’re not having to work on the code as a team, you can get away with not using most “conventions” (certainly the design level stuff) with only the downside of some loss in productivity (you lose code clarity and simplification, which increases the likelihood of bugs and makes it slower to transverse and spot stuff in the code when you have to go back and forth to change things).

I’ve worked with people who weren’t programmers but did code (namelly with Quants in Finance) and they’re simply not very good at doing what is but a secondary job for them (Quants mainly do Mathematical modelling) which is absolutelly normal because unlike with actual Developers, doing code well and efficiently is not what their focus has been in for years.

@RandomUser@lemmy.world · 1 year ago

deleted by creator

@ericjmorey@programming.dev · 1 year ago

“Pirate for it” was probably the wrong phrase. “Plan for it” was probably what you were thinking when your fingers did something else.

@RandomUser@lemmy.world · 1 year ago

deleted by creator

Mister Neon · 1 year ago

Check Udemy for courses and wait for a sale. They normally list for hundreds of dollars but routinely (pretty much monthly) for about $10 - $15 dollars.

@eveninghere@beehaw.org · edit-2 1 year ago

Computer scientist here. First, let me dare ask scientists here a question from a friendly fellow: do you have reference to your suggestions?

Code Complete 2 is a book on software engineering with plenty of proper references. Software engineering is important because you learn how to work efficiently. I have been involved in plenty of bad science code projects that wasted tax payers money because of the naivety by the programmers and team management.

The book explains how and why software construction can become expensive and what do about it, covering a vast range of topics agreed by industrial and academic experts.

One caveat, however, is that theories are theories. Even best practices are theories. Often, a young programmer tries to force some practice without checking the reality. You know you can reuse your function to reduce chance of bugs and save time. But have you tested if that is really the case? Nobody can tell unless you test, or ask your member if that’s a good idea. I’ve spent a good chunk of time on refactoring that didn’t matter. Yet, some mattered.

That importance of reality check is emphasized in the book Software Architecture: The Hard Parts, for example.

Now, classes, or OOP, have been led by the industry to solve their problems. Often, like in case of Java, it was a partly a solution for a large team. For them it was important to collaborate while reducing the chance of shooting someone accidentally. So, for a scientific project OPP is sometimes irrelevant, and sometimes relevant. Code size is one factor to determine the effectiveness of OOP, but other factors also exist.

Python uses OOP for providing flexibility (here I actually mean polymorphism to be precise), and sometimes it becomes necessary to use this pattern as some packages rely on it.

One problem with Python’s OPP is that it inherits implementation. Recent languages seem to avoid this particular type of OOP because the major rival in OOP, what is called composition, has been time-proven to be easier to predict the program’s behavior.

To me, writing Python is also often easier with OOP. One popular alternative to OOP is what is called a functional approach, but that is unfortunately not well-supported in Python.

Finally, Automate the Boring Stuff With Python is a great resource on doing routine tasks quickly. Also, pick some Pandas book and get used to its APIs because it improves productivity to a great extent. (I could even cite an article on this! But I don’t have the reference at hand.)

Oh, don’t forget ChatGPT and Gemini.